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1.  UNIVARIATE  DIRECTED  GRAPHS 


A  direct  graph  consists  of  a  set  of  g  nodes  and  a  set  of  directed  arcs  connecting  pairs  of 
nodes.  Such  graphs  are  natural  mathematical  representations  of  biological  and  social  networks. 
They  are  also  used  in  various  other  applications  such  as  statistical  geography  and  transportation 
networks,  and  in  the  study  of  disease  contagion  using  acquaintance  networks.  For  a  social 
network  the  nodes  of  a  graph  may  represent  individuals,  groups,  or  even  organizations,  and  the 
arcs  correspond  to  relationships  or  choices  broadly  interpreted  to  represent  any  type  of  binary 
relationship.  It  is  customary  (e.g.  see  Harary,  Norman,  and  Cartwright,  1965)  to  use  an 
incidence  matrix  representation  of  directed  graphs.  Thus,  corresponding  to  each  graph  is  an 


adjacency  matrix,  x  =  (x  j,  such  that 


where  x  =  0. 


if  i  chooses  j 
otherwise  . 


(1) 


Holland  and  Leinhardt  (1975,  1979)  and  Frank  (1971,  1981)  summarize  the  historical 
development  of  random  graphs,  for  which  the  observed  adjacency  matrix  is  treated  as  the 
realization  of  a  matrix  random  variable,  X,  which  has  a  probability  distribution  on  the  set  of 
all  directed  graphs  with  g  nodes.  Typically,  the  observed  features  of  an  empirically  construe  red 
directed  graph  are  compared  with  the  distribution  of  features  that  is  generated  by  some 
random  graph.  This  basic  idea  can  be  traced  back  in  the  social  science  literature  to  Moreno 
(1934). 


One  of  the  more  interesting  developments  in  the  modelling  of  directed  graphs  is  due  to 
Holland  and  Leinhardt  (1981),  who  begin  by  assuming  independence  of  relationships  amongst 
pairs  of  nodes  or  dyads.  Their  basic  model  can  be  represented  in  the  form 
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log  Pr[l-X.j)(l-Xji)  =  13 
log  PrtX.Xl-X,.)  =  1] 

log  Prt(l-X  )  X  =  1] 

u  j1 

log  Pr[X.  X.  =  13 

ij  ji 


X..  +  a.  +  B.  ♦  8  , 

ij  i 

X..  +  a  +  B.  *  8  , 

ij  j 

X..  +  «.  +  a  +  +  fi.  +  29  +  p  . 

u  i  i  'i  ri  r 


(2) 


The  parameter  X(.  is  required  for  normalization  purposes  (each  dyad  must  be  in  one  of  the 
four  possible  states),  {« .}  and  {fi.)  are  effects  that  measure  the  productivity  and  attractiveness 
of  the  nodes,  8  is  a  choice  parameter,  and  />  is  a  measure  "reciprocity."  Note  that  model  (2) 
is  loglinear  in  structure.  Holland  and  Leinhardt  present  iterative  methods  for  maximum 
likelihood  estimation  for  the  parameters  in  this  model  (see  the  discussion  of  the  estimation  of 
parameters  in  loglinear  models  for  categorical  data  in  CONTINGENCY  TABLES),  and  Fienberg 
and  Wasserman  (1981a)  provide  an  alternative  approach  based  on  a  simple  transformation  of 

the  data  and  the  use  of  the  method  of  iterative  proportional  fitting*.  They  also  suggest 

) 

several  generalizations  of  the  Holland-Leinhardt  model  where,  for  example,  the  parameter  p  in 
expression  (2)  is  replaced  by 

fin*  P+  Pi*  Pi  -  <3> 

where  £/».  =  0,  and  demonstrate  how  the  parameters  of  this  model  can  also  be  estimated  by 

iterative  proportional  fitting. 


Two  outstanding  theoretical  statistical  problems  in  connection  with  the  Holland  and  Leinhardt 
univariate  model  and  its  generalizations  are  (i)  the  lack  of  an  appropriate  asymptotic 
framework  for  inference  (see  the  discussions  in  Fienberg  and  Wasserman  (1981b)  and  Haberman 
(1981))  which  is  needed  to  carry  out  goodness-of-fit  tests,  and  (ii)  the  need  for  alternative 
models  which  allow  for  dyadic  dependence  and  include  the  Holland-Leinhardt  model  as  a 
special  case. 


2.  MULTIVARIATE  DIRECTED  GRAPHS 


A  multivariate  directed  graph  is  simply  a  collection  of  univariate  directed  graphs  with  the 
same  g  nodes.  (The  term  multi-graph  is  also  in  wide-spread  use.)  If  there  are  R  such 
univariate  graphs,  then  we  represent  the  mfcltivariate  graph  by  the  collection  of  adjacency 
matrices  for  the  R  univariate  graphs,  {xt,  xR).  We  may  thinlc  of  the  R  graphs  as 

representing  either  R  different  types  relationships  amongst  the  g  nodes,  or  the  same  relationship 
at  R  different  points  in  time.  In  either  case,  we  wish  to  think  of  an  observed  multivariate 
graph  as  a  realization  of  a  random  multivariate  graph  X  =  (X,.  X2 . XR1. 

In  the  univariate  situation  we  saw  that  each  dyad  had  four  possible  realizations: 

(1,1)  :  arcs  in  both  directions  , 

(1,0)  or  (0,1)  arc  in  one  direction  , 

(0,0)  :  no  arc  . 

Now  each  dyad  has  22R  possible  realizations. 

Fienberg,  Meyer,  and  Wasserman  (1981)  have  proposed  a  class  of  loglinear  models  for 
random  multivariate  directed  graphs,  that  generalize  some  aspects  of  the  Holland-Leinhardt 
model  to  the  multivariate  case.  By  sacrificing  the  node-level  parameters,  {«.}  and  (/?.), 
associated  with  each  univariate  graph,  these  models  incorporate  not  only  reciprocity  effects  for 
dyadic  patterns  of  the  form: 


x  « - *  y  , 

Relation  r 

but  also  exchange  effects  for  patterns  of  the  form: 

Relation  r( 
- ■* 

X  Y  . 

«- - 

Relation  r, 

and  multiplex  choice  effects  for  patterns  of  the  form: 


i 
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Relation  r{ 

- > 

X  Y  . 

- > 

Relation  r2 

as  well  as  multivariate  generalizations  of  these  effects. 

Although  there  are  23R  possible  dyadic  realizations  we  only  get  to  observe 

2*  +  2R(2R  -  l)/2 

states  when  the  nodes  lose  their  individual  identities.  We  can  still  summarize  the  data  in  the 
adjacency  matrices  of  the  multivariate  graph  by  counting  every  dyad  twice,  once  from  the 
perspective  of  each  node.  As  a  consequence  we  end  up  with  a  22R  table  of  counts,  with  the 
2r  cells  corresponding  to  reciprocal  arcs  on  each  relation  (both  present  or  both  absent) 
containing  double  the  actual  number  of  dyads,  and  each  of  the  remaining  2R'1(2R-1)  patterns 
yielding  two  symmetrically  placed  duplicate  counts  in  the  table. 

Fienberg,  Meyer,  and  Wasserman  (1981)  show  how  fitting  a  simple  affine  translation  of  a 
loglinear  model  for  the  2R  +  2R_,(2R— 1)  counts  corresponds  to  fitting  standard  loglinear  models 
to  the  22R  table  of  duplicated  and  doubled  counts 


3.  TWO  EXAMPLES 

Holland  and  Leinhardt  (1981)  illustrate  their  univariate  model  on  data  collected  by  Sampson 
(1969)  who  spent  a  year  observing  monks  in  an  American  Monastery.  Sampson  measured  both 
negative  and  positive  relationships  on  four  dimensions  at  five  different  points  in  time.  The 
same  18  monks  were  interviewed  at  three  of  these  time  points.  Thus  the  data  can  be 
represented  in  the  form  of  an  R  =  4  X  2  X  3  =  24  variate  directed  graph  involving  18  nodes. 
Holland  and  Leinhardt  analyze  only  a  single  relationship  from  this  data-set. 

Galaskiewicz  and  Marsden  (1978)  and  Fienberg,  Meyer,  and  Wasserman  (1981)  describe  data 
from  a  study  of  the  formal  organizations  in  a  small  midwest  U.S.  community  of  32,000  persons 


referred  to  by  the  pseudonym  "Towertown."  They  focus  their  analyses  on  a  subset  of  73 
organizations  and  their  links  on  three  relations:  (1)  information.  (2)  money,  and  (3)  support 
Thus  the  original  data  take  the  form  of  three  73X73  adjacency  matrices,  but  the  analyses  focus 
on  a  summary  of  these  in  the  form  of  a  2*  table  of  counts  of  pairs  of  organizations.  The 
full  adjacency  matrices  are  availabe  in  Fienberg  and  Galaskiewicz  (1982).  The  most  substantial 
estimated  effects  in  the  loglinear  models  fitted  by  Fienberg,  Meyer,  and  Wasserman  are 
associated  with  choices  id's),  reciprocity  (/>* s),  and  a  multiplex-reciprocity  effect  associated  with 
the  dyadic  pattern: 

Information 

« - » 

X  Y 

« - - 

Support 


4.  SOME  RELATED  STATISTICAL  APPROACHES 
In  a  pair  of  related  papers.  White,  Boorman  and  Breiger  (1976)  and  Boorman  and  White 
(1976)  proposed  a  method,  labelled  as  blockmodelling,  for  the  analysis  of  data  in  the  form  of 
multivariate  directed  graphs.  A  block  model  for  a  network  consists  of  a  partition  of  the  nodes 
into  blocks  of  structural  equivalent  nodes  (i.e.  ones  which  relate  in  the  same  way  to  all  other 
nodes  in  the  network),  and  corresponds  to  a  deterministic  rather  than  a  stochastic  model. 
Unfortunately,  few  directed  graphs  yield  exactly  to  such  blockmodels,  and  substantive  social 
science  theory  does  not  always  suggest  appropriate  partitions.  Thus  White,  Boorman,  and 
Brieger  suggested  the  use  of  a  statistical-like  approach  to  the  search  for  an  "acceptable"  block 
model  of  a  particular  form,  and  they  demonstrate  their  approach  on  Sampson’s  monastery  data 
described  above. 


Breiger.  Boorman  and  Arabia  (1976)  describe  a  more  general  search  procedure  for  a  block 
model  structure,  based  on  hierarchical  clustering*  methods,  and  apply  their  method  to  a 
study  of  directorship  interlocks  in  American  industry.  These  methods  are  closely  related  to 
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other  exploratory  statistical  procedures  for  row-column  permutations  of  a  matrix,  such  as 
nonmetric  multidimensional  scaling *  (see  Arabie,  Boorman,  and  Levitt  (1978)), 

Major  drawbacks  of  blockmodel  methods  include:  (i)  their  inexplicit  use  of  formal  parametric 
models,  (ii)  the  use  of  arbitrary  criterion  functions  for  the  choice  of  partitions,  (iii)  the 
inability  to  distinguish  actual  structure  from  chance  variation.  Their  major  advantage  is  that 
they  provide  an  explicit  model  for  the  pattern  of  responses,  which  many  sociometricians  find 
very  useful  for  thinking  about  sociological  theory  (see  Light  and  Mullins  (1979)). 
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